On the k-Means/Median Cost Function

نویسندگان

  • Anup Bhattacharya
  • Ragesh Jaiswal
چکیده

In this work, we study the k-means cost function. The (Euclidean) k-means problem can be described as follows: given a dataset X ⊆ R and a positive integer k, find a set of k centers C ⊆ R such that Φ(C,X) def = ∑ x∈X minc∈C ||x− c|| 2 is minimized. Let ∆k(X) def = minC⊆Rd Φ(C,X) denote the cost of the optimal k-means solution. It is simple to observe that for any dataset X, ∆k(X) decreases as k increases. We try to understand this behaviour more precisely. For any dataset X ⊆ R, integer k ≥ 1, and a small precision parameter ε > 0, let L X denote the smallest integer such that ∆Lk,ε X (X) ≤ ε ·∆k(X). We show upper and lower bounds on this quantity. Our techniques generalize for the metric k-median problem in arbitrary metrics and we give bounds in terms of the doubling dimension of the metric. Finally, we observe that for any dataset X, we can compute a set S of size O ( L ε c X ) such that ∆S(X) ≤ ε ·∆k(X) using the D-sampling algorithm which is also known as the k-means++ seeding procedure. In the previous statement, c is some fixed constant. Some applications of our bounds are as follows: 1. Pseudo-approximation of k-means++: Analysing the approximation and pseudo-approximation guarantees of k-means++ seeding has been a popular research topic. The goal has been to understand how the cost behaves as a function of the number of centers sampled by this algorithm. Our results may be seen as non-trivial addition to the current state of knowledge. 2. Sampling based coreset for k-means: Our bounds imply that any constant approximation algorithm when executed with number of clusters O ( L ε2 c X ) gives an (k, ε)-coreset for the k-means problem. In particular, this means that any set S of size O ( L ε2 c X ) sampled with D-sampling is a (k, ε)-coreset. This gives an improvement over similar results of Ackermann et al. [1].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بهبود الگوریتم خوشه بندی مشتریان برای توزیع قطعات یدکی با رویکرد داده کاوی (k-means)

Customer classification using k-means algorithm for optimizing the transportation plans is one of the most interesting subjects in the Customer Relationship Management context. In this paper, the real-world data and information for a spare-parts distribution company (ISACO) during the past 36 months has been investigated and these figures have been evaluated using k-means tool developed for spa...

متن کامل

Effect of Objective Function on the Optimization of Highway Vertical Alignment by Means of Metaheuristic Algorithms

The main purpose of this work is the comparison of several objective functions for optimization of the vertical alignment. To this end, after formulation of optimum vertical alignment problem based on different constraints, the objective function was considered as four forms including: 1) the sum of the absolute value of variance between the vertical alignment and the existing ground; 2) the su...

متن کامل

Batch and median neural gas

Neural Gas (NG) constitutes a very robust clustering algorithm given Euclidean data which does not suffer from the problem of local minima like simple vector quantization, or topological restrictions like the self-organizing map. Based on the cost function of NG, we introduce a batch variant of NG which shows much faster convergence and which can be interpreted as an optimization of the cost fu...

متن کامل

Clustering with Intelligent Linexk-Means

The intelligent LINEX k-means clustering is a generalization of the k-means clustering so that the number of clusters and their related centroid can be determined while the LINEX loss function is considered as the dissimilarity measure. Therefore, the selection of the centers in each cluster is not randomly. Choosing the LINEX dissimilarity measure helps the researcher to overestimate or undere...

متن کامل

Multi-objective optimization approach for cost management during product design at the conceptual phase

The effective cost management during the conceptual design phase of a product is essential to develop a product with minimum cost and desired quality. The integration of the methodologies of quality function deployment (QFD), value engineering (VE) and target costing (TC) could be applied to the continuous improvement of any product during product development. To optimize customer satisfaction ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1704.05232  شماره 

صفحات  -

تاریخ انتشار 2017